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Abstract. For Hawkes process (long-memory point process P with 
intensity rate r(t) = \(go(t) + ^2 T<t Tgg h(t — r))) some existence and 
stability properties are studied. The new approach to Hawkes process 
presented in this paper allows to enlarge the class of conditions under 
which the uniqueness of invariant distribution of the process can be 
shown; unlike previous results the A is allowed to be non-Lipschitz and 
even discontinuous. Multi-type and other generalizations are provided. 
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1. Introduction 

1.1. Hawkes Process Informally. This paper studies Hawkes Process P 
which is a particular type of time-homogeneous self-exciting locally-finite 
point process on R with long-memory. Realization of P is a random locally- 
finite subset S of R. Any locally-finite point process is characterized by 
intensity rate r(t, u) that is the intensity of the point process at time t 
conditioned on the past history of the process until time t. If we denote the 
random set of the point process by S(u), then this can be expressed as 



#\S(uj)n[t,t + dt]\ > i\T t 



r(t,u)St + o(8t) (1) 



where T% is the a— field for Sf(u) := S(uj) H (— oo, t) generated by elementary 
events {S(u) n / / 0} where interval I varies over all subintervals (— oo,t). 

It is convenient to view the process as a random subset of M + := [0, oo) 
and thus intensity rate function r(t, to) is defined for t 6 M+. Hawkes Process 
is defined by three M+-valued functions on M + . Given these three functions 
A, h, go the intensity rate 

r(t,Lu) = \(go(t) + J2 h (t-T)) (2) 

res* 

where go is some initial condition. 

Written above is an informal description of the process. The description 
is similar to ones found in the literature except for term go- Usually instead 
of starting from initial condition go, the process is defined by giving So, i.e. 
points of process before time 0. This can be done by letting 

g (s)= h(t + s-T) (3) 

where So ia a locally finite collection of points in (— oo,0). The reason for 
this approach is not only a minute formal generalization of the process but a 
different perspective that we describe in the following paragraph informally 
and in section [2] formally. The perspective is viewing Hawkes process as 
a solution of a stochastic partial differential equation with jumps. In fact 
Hawkes process corresponds to solution of one of the simplest such equa- 
tion where the rate of jumps depends only on g(0), jump is always h and 
the continuous part of evolution is an infinitesimal shift operator. Yet this 
simplest example is hard to analyze in case when it is non-linear. 

There is a built-in invariance under time translation due to form of ([2]). 
The evolution of the impulse function gt : M+ —> K+ is defined by: 

9t(s) =go(t + s)+ Y, h(t + a-r) (4) 

It is an J^-measurable Markov process (it does not contain all the past 
information; for example when h(t) = e~ l ). The next point of time r E S 
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has the conditional distribution 



P[t > t\T t ) = exp 

at which point g s (-) jumps up by h(- 
process. 



\(g t (s))ds 



(5) 



Thus gt is in itself a Markovian 



1.2. Results. The main two results of this paper — Theorems [2] and [3]— 
generalize Bremaud-Massoulie theorem [JJ in different directions and prove 
that under certain conditions on A and h, for a wide class C of initial condi- 
tions go, the distributions of gt(-) converge to the same limit. Furthermore 
the limiting distribution is supported on C. 

In addition to these theorems we also observe several facts well known 
for attractive systems. These are collected in Proposition [T] Rigorous state- 
ments of all results are in sections |4] and [5j 



1.3. Historical Perspective and References. Point processes were first 
studied in [3] by Erlang in relevance to queueing theory and Hawkes process 
were first introduced in [6] to study self-exciting point processes. For further 
references please see [2J E] • The results of this paper are the first results that 
considers non-Lipschitz A and A that has jumps. Another related study is 
that of large deviations with non-Lipschitz A but h restricted to be sum of 
exponentials see [13] ; for general h but Lipschitz A see |14j ; and for linear A 
with explicitly computed large deviation function and other limit theorems 
see [TSJ- 
Currently Hawkes processes are used to model many phenomena ranging 

from queues and population control; to mutations and virus spread; to de- 
faults and jumps in the markets; to neuroactivity and social-networks; and 
finally to modeling of artificial intelligence and creative thinking. Most of 
the real world applications involve use of multi-dimensional Hawkes process 
yet this paper considers only one-dimensional Hawkes process for purposes 
of clarity. The rough scope of possible generalization to multi-dimensional 
Hawkes process (as well as generalized A and h) is outlined in section [7| 

1.4. Example. Simplest example comes from population control theory. 
Consider asexual population in certain region as a subset of time where 
each point in time represents either immigrant into region or birth inside 
the region; and that immigrants are same as newborns. Further assume that 
rate of immigration is a and that number of children of each member of 
population is (3 with each child birth-date distribution being ^j^p-dt where 
X is the birth-date of the parent. This corresponds to Hawkes process with 
\(z) = a+(3-z and h as given. This is an example of a Hawkes process where 
A is linear and this also provides its relation to Galton- Watson trees which 
happens to be an easy way of seeing many properties of Hawkes processes. 
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2. Rigorous definition of the process and results 
2.1. Definition of Hawkes Process. 

Definition 1. Hawkes process is a random collection P of points in IR+ := 
[0, oo) characterized by triplet of functions (X,h,go) where the functions 
X,h,go : [0, oo) —> [0, oo). The conditional Poisson intensity at time t is 

n:=AU(*)+ E Kt-r)\ (6) 

\ 0<T<t,reS J 

where the sum is over all previous points in St := S fl [0, t] . The function go 
describes the initial condition and the functions X and h describe the evolu- 
tion of the process. We denote this Hawkes process by quadruple (P, A, h, g). 

We will define the coupling of these different measures P 9 by canonical 
construction in the next subsection 13. 1[ 

It is convenient to set up the state space X of locally L\ M+ -valued 
functions on IR + . Let A4(X) be the space of probability distributions on X. 
Then the process can be realized as an X-valued process. Start from go £ 
X and consider random evolution determined by semi-group Tt defined by 
gt +s = Ttg s . Evolution has two components: deterministic flow (Ttg)(x) = 
g(t + x) for t up to stopping time r at which point (T T g)(x) = g(r + x) + h(x) 
with the distribution of r = r(g) given by 



, [r > t] = exp 



X(g(s))ds 



o 



(7) 



After r, semi-group continues as before and new clock starts. Each time 
stopping time occurs there is a point in P. The generator of semi-group Tt 
is given by: 

A:=V + X(g(0))(9 h -ld) (8) 

where Oh is a shift operator defined by {9} l )F{g) = F{g + h) and D is a 
derivative of push-forward of time evolution defined by 

VF {9 ): = ^ F ^m (9) 

e-S>0 e 

where 9 e is again a shift operator defined by 9 € g(s) = g(s + e). It might 
be slightly easier to understand in infinitesimal form As of generator A and 
test functional F: 

A 5 F{g) := F(9 5 g) - F(g) + X(g(0))(F(g + h) - F(g))S + o(5) (10) 
Then this X-valued process gt satisfies 

9t(s) =g (t + s) + ^2 h(t + s-r). (11) 

reSt 
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For any initial condition go(-), we have a random Markovian evolution 
gt{-)- Then \{zt) will be the intensity of the point process, where 

z t :=g t (0) = g(t) + J2 h (.T-t). (12) 

res* 

It is sometimes more convenient to consider instead of St the process 

Qt = (N t ,g t ) (13) 

where Nt = Jfc(St) is the number of jumps from time to time t. Then Qt 
can be viewew as a point process via Nt and as Markovian process via gt- 

Remark 1. We assume h(t)dt = 1 without loss of generality. Indeed 
one can always achieve this if\\h\\\ = J h(t)dt < oo by observing that triples 
(X(z), h(t), g(t)) and (X(z\\h\\i), pri^) produce the same Hawkes process 
with same distribution of P. 

On the other hand if \\h\\i = oo and inf 2e K + X(z) > then lim^oo <?t(0) = 
oo a.s. and hence the stationary version of the process is going to Poisson 
process of intensity lim 2 _ ! . 00 X(z) if such a limit exists. 

2.2. Metrics ands Measures. We equip space X with L l f c metric 

w / c v a ( f\ ST 1 Io\9(s) ~ f(s)\ds 

Vgjex, dx{9j)= ^- l + tf\g(s)-f(s)\ds (14) 

The weak convergence in Proposition [T] will take place in the space of dis- 
tribution on X. 

Definition 2. Let D(X) be Skorohod space of X -valued processes equip with 
Skorohod J\-metric d and a -field Q generated by open balls in d and {G g } 
is the corresponding filtration. Let g* be the non-initial part of impulse 
function: 

9*t(s)= Y, h (t + s-r) (15) 

and 

Vg,geD(X), d*(g,g) := d(S(g),S(g)) = d(g* ,g*) (16) 

be the pull-back of d under the map S : D(X) — > D{X), S : g i— )■ g* defined 
according to (15) as 

(S(g)) t (s) = g t (s)-go(t + s) (17) 

Let Hf^ s be the distribution of g s — impulse-function at time s — starting 
from initial condition f . 

The distribution of the whole process g starting with initial condition go 
will be denoted by ¥ go and the associated expectation values will be denoted 
by E go . If the initial distribution is random and given via some measure fi 
then we denote 



J P f fi(df), E M = j E f u(df), 



(i 
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Similarly P* ,E* Q ,P*,E* will be used as analogues expressions for g* . 

In Theorems [2] and [3] we are going to consider total variation dxv of P* 
starting from two different initial conditions which can also be viewed as 
total variation of the corresponding P but under a different cr-field Q* = 
QoS- 1 . 

Furthermore let us consider dxv,T be total variation after time t that is 
total variation with sigma-algebra Q o (5| p-- j00 )) 1 where operator (5|[y )0O ) 
is defined by: 

0%,oo)(<7)) t O) = h> T {gt(s) - g (t + s)) (19) 
3. Tools 

The two tools we are going to use are: coupling and parent-child structure. 
These two allow one to get the first intuitions of the process rather quick 
since then process cannot be "too different" from the example we have given 



in subsection 1.4 The coupling is done via the canonical coupling of point 



processes which we discuss first. 

3.1. Canonical Construction. One way to construct point processes is to 
use canonical space. To any measure m one can associate a Poisson process 
that for any measurable A, would associate a random number of points in A 
with Poisson distribution of rate m(A). In our case canonical measure P will 
be Poisson point process corresponding to Lebesgue measure on M + x M + . 

This process is defined by the property that any region A 6 M + x R + 
that has area \A\ would have the number of points distributed according to 
Poisson with rate Given A, h,g measure P induces a measure P g ' con- 
sistent with the definition above via letting r being part of the corresponding 
point process P if there is a point on strip r x [0, \(ztj) of a canonical plane 
R + x IR + . In the rest of the paper we will denote P = Fg ,h due to the fact 
that A and h would be the same unless stated otherwise 

This induces a canonical coupling on the collection of the measures (P 9 ) 
indexed by initial conditions g. This coupling is no way unique but this 
coupling is maximum for non-decreasing A and two different initial condi- 
tions one of which is strictly larger than the other (by point-wise partial 
ordering) . The coupling is maximum since the number of points in the dif- 
ference of these two processes will be stochastically dominated by any other 
coupling. 

3.2. Coupling / Stochastic Domination. Here we present one particular 
coupling that we use here. 

Lemma (Stochastic Domination). Suppose (P,X,h,go) and (P, A, h, go) are 
two Hawkes processes satisfying 

Vx G M+, (h(x) < h(x),g(x) < g(x)) (20) 

Vx<y,\(x) < \(y) (21) 
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Then there exists a coupling such that PcP, i.e. S<zS. 
Note: h, h are not necessarily normalized. 

Proof. If a(t), b(t) are two intensities and if a(t) < b(t) for all t, one can 
couple the point processes, such that the two processes jump together with 
rate a(t) and the second one jumps by itself at rate b(t) — a(t). This can 
also be done if Vw G Q,a(t,uj) < b(t,u). In fact this is done naturally by 
our construction. Hence it remains to prove that 

AGft(0))<A(&(0)). (22) 

We know that up to the first jump in P we have gt(0) = go(t),gt(0) = 



go(t). However we know from (20) and (21) that 



A(so(t)) < A©b(t)) (23) 



which in turn implies that (|22j) holds up to first jump of P. Then we claim 
by induction that it holds for all times. Indeed at the time of jump the order 
of g < g is preserved because there is no jump in P before first jump of P 
while since h < h, the jump of the g is larger. □ 

3.3. Parent-Child Structure and Branching representation. 

We add the following parent-child structure to obtain a random forest 
structure embedded in time. This is useful due to connection to Galton- 
Watson trees or branching processes. 

Start with Hawkes process with functions (A, h, go) and let P = {n, T2, ...}, 
where r is an increasing sequence, i.e. i < j ==> t\ < tj. Let us also denote 

A (z) := A(z) - A(0) (24) 

Now to each we associate a randomly chosen parent element p(rj) from 
P U {— oo}, where — oo represents having no parent and being a root node. 
Given a sequence (ri, T2, Tj), the distribution of p(t{) is independ random 
variable given by the following law: 

Fh(r-) - -ooln rl - A( ° } I X ^ a{n) (25) 
P\p(n)- oo|7i,...r,j- AW + XM ^ (25) 

m^=r j \r 1 ,...^ = l j<i ^ h ^pl (26) 

Note that p{ji) is a collection of mutually independent but not identically 
distributed variables. 



Figure 3.3 provides a particular visualization of parent-child structure 
that is consistend with the above description. We are specifically interested 
in linear case which provides us with dual description of the same process 
as described in the foolowing tool: 

Lemma (Branching Process Equivalence). When \{z) = \{z) = A + Bz 
roots have rate A + g(t) and each tree is branching process with number of 
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Figure 1. For both X(z) = 1 + z, h(t) = jp^z- Each 
region corresponds to children of point of corresponding 
color. 



branches having distribution Poisson(B). The distribution of age of any 
parent Tj at the time of birth of any child r is then given by 



/oo 
h(s)ds. 



(27) 



Proof. By the above scheme we see that the intensity of roots is given by 



F\p(Ti) = -oo]X(z n ) = X(0)+Bz, 



gin) 



A + g(t). 



(28) 



Now each new point creates an area of size = B and hence the number 

of children is Poisson(5). Now the shape of the area is given via h which 
then proves (27). □ 



Definition 3. We can further split the points into classes. All roots are 
generated with rate A + g(t) and in the linear case we can split the ones 
generated from A and the ones from g(t). The point whose root will be 
coming from g(t) will be called initial. Hence initial roots are those that are 
generated by g{t) portion and initial points are all the points in the trees of 
those initial roots. The set of all initial points is going to be denoted by T 



4. Existence 



We begin with preliminary results regarding existence and ergodicity. Let 
us consider the following hypothesis on Hawkes Process. 
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Hypothesis 1. The function A satisfies: 

3A,B > 0,Vz G R,X(z) < \(z) := A + Bz (29) 

which brings us to our first theorem: 

Proposition 1. Suppose Hawkes process satisfies Hypothesis^ Then the 
following three statements hold: 

(i) Hawkes process is well defined for all times. 

(ii) When the Hypothesis^ is satisfied with B < 1, there exists an invariant 
distribution for process gt . 

(Hi) If in addition A is non- decreasing and we start from initial condition, 
i.e. go = 0, then the distribution of g t converges weakly to no, the 
unique minimal measure and hence ergodic. 

Proof of Proposition^- (i) Consider Hawkes process with rate A(z) = A + 
Bz; by coupling this process is dominating and by Galton- Watson repre- 
sentation it is well defined for all times. Hence dominated process is well 
defined for all times. 

(ii) If B < 1, then the dominating process has finite average density; then 
so is dominated process and we can get invariant distribution fi go by taking 
Cesaro limit along diagonalization subsequence 

H go := lim — / Hs, go ds (30) 

where fJ- s ,g represents marginal distribution of the impulse-function at time 
s given some initial condition go- 

(iii) The expected size of each tree is finite and hence we have finite density. 
Since A is non-decreasing the process is order-preserving. Hence 

E [f(g{t + Sl ),...,g(t + s k ))] (31) 

is non-decreasing in t for any non-decreasing / which implies convergence 
of /j, S) o to no- 

This gives us weak convergence. Claim is that obtained invariant distri- 
bution are unique minimal (by point-wise ordering) . To see this consider any 
other invariant £l and choose initial condition chosen randomly correspond- 
ing to measure Jl; but then we have point-wise domination and by coupling 
we see that Jl dominates [i. Hence li is also extremal invariant measure and 
hence ergodic. □ 

5. Uniqueness 

Definition 4. Let invariant distribution [x be supported on class of functions 
C, meaning that 

tiC) = 1 (32) 
is satisfied. Then we say that the pair (X,h) is (C, fj,)- stable if starting from 
any initial condition go in C the distribution ^ t ,g Q of the impulse function at 
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time t converges to \i: 

lim ix t go = fj, (33) 

t— >oo 

where = d signifies that the limit is in the distributional sense. 

Now we turn to main results of this paper. Here is our 2 nd hypothesis: 
Hypothesis 2. The function X is non- decreasing and it satisfies 

sup (X(x + s) - X(x)) < 4>(s) (34) 

for some concave non- decreasing (j) satisfying 

/"OO /*00 

/ <p(H(s))ds = C < oo, where H(s) = / h(t)dt (35) 

Jo Js 

Theorem 2. If Hawkes process satisfies both Hypothesis^ with B < 1 and 
Hypothesis^ then pair (X,h) is (C, \i)-stable as in definition^ with 



oc 



C := tg G C(M+) : J 4>(g(s))ds < ooj (36) 
and fj, being fi® from Proposition^ part (Hi). 

Remark 2. In this theorem we relax the Lipschitz condition with constant 
1 on X that was imposed in [1] (in a slightly different form since \\h\\ was 
not normalized). When we do have this assumption the results of pQ follow 
as a corollary of the above theorem [^[ In fact the results follow even under 
weaker assumption that X is Lipschitz with constant L for some L: 

Vx,y£R + \X(x)-X(y)\<L\x-y\ (37) 

Before we start with proof of theorem [2] let us prove the following lemma: 

Lemma 2.1. If [i is a stationary distribution of the impulse function gt of 
a Hawkes process then 

E^[g(s)]=E ll [g(0)]H(s). (38) 

Proof. By stationarity 



g(0)h(t)dt 



E M fo(0)] / h(t)dt = E[g(0)]H(s). (39) 

□ 



Proof of Theorem^- We use a recurrence argument. It is enough to show 
that exists some class C'cC such that: 

(i) / G C implies that ¥*j and Pq have non-trivial overlap 

/(P},Pq) :=l-d TV (F* f ,F* ) >6>0 (40) 

(ii) C is recurrent with respect to C in the sense that starting from any 
point in class C the impulse function will enter C at some time in the future, 
i.e. 

Vg eC,F go [rftem + ,g t <tC'} = (41) 
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Hence we will split the proof into three steps. In step (i) and (ii) we 
will show the above statements and in step (iii) we will run the recurence 
argument. Step (i) in turn will suggest C . 

Step (i): First observe that by applying Jensen's inequality and concavity 
of (b we obtain: 



= E 

> exp 

> exp 
= exp 



exp 



-E 
-E 



\X(z t + f(t))-X(z t )\dt 



\X(z t + f(t))-X(z t )\dt 



<Kf(t))dt 



<t>{f{t))dt 







Step (ii): Step (i) suggests us to take 



C':=\ge C(l 



<f>(g(s))ds < K 



(42) 
(43) 
(44) 
(45) 

(46) 



where to satisfy C-recurrent condition we will choose K as follows. We 
know that for any finite time t process which started from C will remain 
in C since <p is convex. Now consider starting from C but with A instead 
of A as parameter of our process. Then applying domination and Jensen's 
inequality we get 



(f){g t (s))ds 



< E 



A 



4>(g t (s))ds 



(47) 



Apply limsup^oc to both sides of (47) and then use the fact that right 
hand side converges from any initial g by proposition [I] part (iii) : 



limE 

t— >oo ■ 



<t>(gt(s))ds 



x 



< lim E 

t— >oo 



< lim E^ 



< lim 

t— >oo 



4>{gt{s))ds (48) 
H9t(s))ds (1.9! 



(E*[g t (0)]H(s))ds (50) 



where the line (50) follows from Lemma 2.1. Recall that part of assumption 



of hypothesis 2 is that <j) is concave and increasing. The concave property 
implies that <f>(cx) < ccj){x) for c > 1, while increasing that <f>(cx) < <f){x) for 
c < 1. Combining these two we get 4>(cx) < max(c, 1) which then applied 
to ([50]) with c = E$[g t (0)} implies: 



limE 

t— >oo 



A 



(f>(gt{s))ds 



< C lim max(E^ t (0)],l) 

t— >oo 



(51) 
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where C = J °° 4>(H(s))ds < oo by Hypothesis [2J Hence (51) is finite since 
lim^oo Eg [<ft(0)] < 00 by branching representation. Hence fi(C) = 1 and it 
is enough to set 

POO 

K= <j)(H(s))ds lim max(E;Wo)], 1) (52) 



in (|46j). 

Step (iii): Hence we recur to the class C Now starting from any initial 
condition in C' 



I(¥},¥*o) > l-exp(-K) (53) 

and result follows from the following schematic representation 

g C +± C -»■ n (54) 

where all arrows represent transitions with uniformly positive probability 
which implies the convergence to /U. 

Let us describe this in detail. Define two sequences of interchanging 
stopping times as follows: Let stopping time <a be the first time that gt is 
in C . Then we try to couple the process P with P^ where P^ is Hawkes 
process with same pair (A, h) but that starts at time t with gt = 0; we 
use canonical coupling. Then by step (i) the coupling is successful with 
probability at least 5 > 0. If it is not then there is a stopping time v\ S 
S\SM, in this case we restart the procedure defining Q,Vi by the following 
recursive definition 

^1 = M{t : g t e C'} 
Vi>l,Vi = mf{t>Q-.teS\S (k) } (55) 
Vi > 1, q = M{t > Vi-i : gt € C'} 

where is Hawkes Process that starts at time Cfe with condition gt = 0. 
Hence we want to show that almost surely for some i, ai = 00 which is 
immediate from step (i): 

00 

P[Vi, Q <oo] = E[JJ/(P^,P5)] (56) 

i=i 
00 

< \\ 1 - exp (-K) = 0. (57) 



i=l 



□ 



Finally we turn to the last theorem where the assumption of continuity 
of A is relaxed. 

Hypothesis 3. (i)Function A is non- decreasing and satisfies: 

A non- decreasing, A(0) > ; A < A, B < 1 (58) 
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(ii) Function h is convex and satisfies: 

/•oo 

Halloo < OO; log\h'(x)\ = o(x), / th(t)dt < oo (59) 

Jo 

Remark 3. (i) These assumptions include the function which were the aim 
of the study, that is h(x) = jj+^wfl where p < 0. 

(ii) These assumptions can be weakened. In particular h can have jumps as 
long as there is finite number of jumps up and property 

a(x) = log\ inf -h'(z)\ = o(x) (60) 

z<x 

is satisfied. 

Theorem 3. If Hawkes process satisfies Hypothesis^ with B < 1 and Hy- 
pothesis^ then pair (A, h) is (C, [i)-stable as in definition^ with 

C ■= | 5 g C(K+) : J tg(t)dt <oo^ (61) 

and \x being /j,q from Proposition^ part (Hi). 

Definition 5. Let v denote distribution of g(0) under /j>. 

Next we state estimates on the tail of v as well as on the probability 
density of v. We will prove the below Lemmas after the proof of Theorem 

El 

Lemma 3.1. Under^ the following statements hold: 

39 > 0,E M [e e2i ] < oo, (62) 

In particular v has exponential tail: 

v[z,oo) < cie" eiZ , (63) 

Lemma 3.2. Under Hypothesis^for any invariant distribution fi, the asso- 
ciated v as in definition has associated density ip(z) and there exists constant 
c\ ( independent of v) such that 

4){z) < c\\(z)v\z + h(0), oo) (64) 



Furthermore Lemma 3A_ implies that there exist C2,#2 such that 

${z) < c 2 e~ d2Z (65) 

Proof of Theorem^- Unlike proof of Theorem[3]the comparison is done with 
initial condition taken as invariant distribution shifted by +/. In rigour we 
define this condition as follows. Consider shift map Sj by / on space X 
defined as 

S f :X^X,S f (g) = f + g (66) 

Then let 

M/ = S?(W)) (67) 
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where Sj is push-forward of the measure, i.e. 

S}(jm>)(A) = »(Sj\A)) 



(68) 



for any measurable A. 

The structure of the proof remains the same as in Theorem [3] while the 
calculations (47)-(51) are as follows. 



E 



exp 



> exp — E 



X{z t + f{t))-X(z t )dt\ 
X(z t + f(t))-X(z t )dt 



(69) 
(70) 



by Jensen's Inequality. Due to invariance of /xq we can replace Zt by zq: 



E 



U0 



X(z t + f(t))-X(zt)dt 



E 



X(z + f(t))-X(z )dt 
4>(z)dX(x)dzdt 



(71) 
(72) 



z<x<z+f(t) 



<t>{z)dzdX{x)dt (73) 



I — f(t)<z<x 



(via Lemma 3.2) < - 



c 2 e- e2Z dzdX(x)dt (74) 



x—f(t)<z<x 




< // / c 2 e- e2a; e- e2 ^zdA(x)^ (75) 



-/(*)<«<o 

< /// c 2 e-»»e«'WA W * ,70) 
-/(*)<*<o 

= c 2 ||/e e ^|| Ll | e^dA^) (77) 

< c 2 ||/|| Ll e e2 ^ll^ y e-^dA(x) (78) 

< oo (79) 
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where the last line follows from Hypothesis [3] since / G L\C\ Loo and the last 
term we can integrate by parts: 

e- e ' 2X d\(x) = A(0) + J 9 2 e- 92X X(x)dx < oo (80) 

xem. + xeR + 

and the rest of the proof follows from Theorem [2] □ 

5.1. Lemmas for Theorem [3l We need to note that the two lemmas that 
we prove in this section are interesting results by themselves as they provide 
non-trivial tail and local-behavior of distribution v which were previously 
unknown. Let us now turn to proof of the tail first. 



Proof of Lemma 3.1: Observe first that ^[2,00] < p[z,oo] by domination. 
Then use Galton- Watson representation in the following way. Consider a 
random tree in our process that has root at g. Points of this tree are a 
random collection T p (including point at g); let us call probability measure 
associated with T, P T and let expectation with respect to F T be denoted by 
E T . It is useful to define for any / : R + — > IR + the following object 

A e , T (t) := lnE M [exp(0^/i(t + ^-r)] (81) 

A e := lnE^exp^)] (82) 

The two are related. In fact for linear \(z) = A + Bz the following relations 
hold for A's via symmetry: 

A e , T (t) = 6h{t) + B f {e Ae ' T{t - s) -l)h{s)ds (83) 

J 

A e = A ( e VT(s) _ y ds ( 84 ) 







where in (83) f(t) comes from the fact that G T and then the recurrence 
relation follows from contribution of children of that have rate Bh(s) and 
contribution e^ B > T<yt ~ s ' > — 1; and in (84) we just count the contribution of tree 



at — s that has rate A and contribution e^ T ^ — 1. Then what we need to 
prove lemma for /1 is A$ < 00. 

Recall that h £ L M and look at A as a function of 9: near 8 = both 
functionals are continuous. Hence the conclusion holds. □ 

Having obtained exponential tail it is now turn to prove that the tail does 
not have bumps: 



Proof of lemma \3J% Let L(s, t, A) be occupation time spend in A by zt := 
gt(0) between time s and t starting from arbitrary initial condition, i.e. 

L(s,t,A)= [ l ZseA ds. (85) 
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Let I = [z, z + e], to = and t%, i > be i th jump after 0, that is 

T{ = inf{i > Ti-i : t G S 1 }. 

Furthermore let 

ra t = inf{s > t : s £ S}. 
be the next jump after time t. Then observe that 



= lim 



E M [L(0,t,I)\ 



t— >oo t 



and turn to the analysis of numerator. 



L(0,t,I) < J2 L ( 

Ti,T i+ l,I)l n<t 



(86) 
(87) 

(88) 
(89) 

(90) 



and taking expectations and using tower property of expectations we obtain: 

(91) 



E[L(0,t,I)} < E^E^.Ti+l.-OI^-llr^] 

i 

= E[J2F(g T ^)l n<t ] 



where function F : X — > is defined as 

F(g)=E g [L(0, n ,I)} 
then we have finally have 



E[L(0,t,I)] <E[ / F(gt-)dNt] 
Jo 

which we can now use as integrand under dNt is immeasurable: 



v[I] 



lim 

t— >oo 



E M [L{0,t,I)] 



(via (94)) 



E, 



< lim 

t— >oo 



JlF{g t _)dN t 



E, 



< lim 

t— >oo 



fiF(gt- + h)\(gt)dt 



(invariance) < jj m 



E, 



f*F{g + h)\{go)dt 



E g+h [L(p, n ,I)]\(g{0))dv(g) 



(92) 

(93) 
(94) 

(95) 

(96) 

(97) 

(98) 
(99) 



gdX 



< 



E 



g+h 



-h'(n) 



l fl (0)+/ l (0)> z A(ff(0))d/i(g) (100) 
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where last equation follows from condition (59): h! being decreasing due to 

since between two 



convexity condition and hence L(0,ri, J) < g i( T1 ^ 
jumps we can cross / only ones. 
Observe that 

e 



< 



-h'( n ) 



> eCi 



(101) 



where the constant C\ > follows from h(t) = e °w and F[t\ > t] < e At 
since the rate of jumps is always at least A. Then we continue from ( |100 ) 



m < 





€ 


I ^g+h 
>9 


[-h'(n)\ 



l g (o)+h(o)>zHg{0))dfj,(g) 



< eC 



1 g(o)+h(o)>zK9{0))d^g) 
\(x)dv(x) 



(102) 
(103) 
(104) 



eC 

lx>z+h(0) 

= ec' 1 A(z)zv[z + /i(0),oo) (105) 

where the last inequality follows the fact that dv(x) has exponential tail. 
Hence we have 

v[I] < e^X(z)u[z + h(0),oo) (106) 

from which we first observe that there is density if) since then u[I] < c' 1 X(z)e 
and then taking limit as e — > in ( 106 ) we obtain local estimate. □ 



6. Speed of Convergence 

Proving the speed of convergence is difficult in general from schemes of 
Theorem [2] and [3} For this we need to adjust Hypothesis [2] to the following 
version: 



Hypothesis 4. The function A is non- decreasing and it satisfies 

X(x)) < fa) 



sup (X(x + s) 



for some (ft satisfying 

and h satisfying: 

lim sup 

t— >oo 



(ft(h(t))dt < 1. 



'En=iB n nf x> th(x)dx 



L>t h ( x ) dx 



-: Ca < oo 



(107) 



(108) 



(109) 



In addition \\g\\i = g(t)dt < oo 



Remark 4. Conditoin (109) can be replaced by other conditions as would be 
seen in the proof but important feature of it that all nicely decaying h(x) for 
example h(x) = ( 1+ ^ P +i ,P > satisfy it; anything like that does as well; one 
example that does not is something like that h(x) = Yli=o°° 2 _2 *~ 1 l 2 i <x <2 i + 1 >' 
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where ^2 1 <x<2 i + 1 i s an indicator function that is 1 if x G [2 l ,2 l+1 ) and 
otherwise. 

Theorem 4. If Hawkes process satisfies Hypothesis [I] with B < 1, anc/[^] 
then the following statements hold about the speed of convergence holds: 

d T v,t(Vo)< r(-^K) v (s)ds (110) 
Jt 1 — Bh 

where " stands for Fourier transform and v for inverse Fourier transform. 
In turn this implies that tail of dTV,t(^g,^o) cannot be much worth than that 
of h(t) or g{t); more precisely: 

hmsup roo f— < oo (111) 
t->oo it g{s) + h[s)ds 

2 

Proof. Consider the canonical coupling P of measures ¥ g and ¥q, that is pair 
of processes (g,g^) which start from initial conditions go = g and g^ = 
respectively. Let Ag = g — g^ be the difference of impulse functions, 
AS = S \ S(°> be the difference in sets and the last point in AS being: 

L := max{s G M + : s G AS} (112) 

Then observe that the convexity of (ft and (ft(0) = implies subadditivity 
of (ft, i.e. (ft(a + b) < (ft(a) + (ft(b), which in turn implies: 

(ft[A(g t (s))) = 0(g o (t + s)+ £ h(t + s - r)) (113) 

reAS,T<t 

< (ft(g {t + s)) + <P(Ht + s-T)) (114) 

T£AS,T<t 

By setting s = 0, the process with rate at time t being 

n ■■= <f>(go(t)) + ^ <f>(h{t-r)) (115) 

reA5,r<t 

is strictly bigger process that has the following interpretation: one has orig- 
inally roots that are thrown on real line with rate (ft(g(t)) and then each 
root generates a Galton- Watson tree where the distribution of children of 
any parent at time X is Poisson with intensity <ft{h{X + 1)). Let AD be this 
bigger process that dominates AS and let its last point be: 

L D = max{s G M + : s G AD} (116) 
Let us first outline the idea of bound. Notice that 

drvAVo) = p I L > *] < F i L D >t}< E[#(AD U [t, oo))] (117) 
where the first inequality follows from L < due to AS C AD and the 
second due to Lp G AD. So we will be bounding the E[#(AD U [t, oo))] 
which turns out to be much simpler to work with. If we define 

F\AD G \t,t + 5}} d 
l(t) := Jim -!= ±! li = - S E[#(AI? U [t, oo))] (118) 
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then: 



/•oo 

/ e 9t l(t)dt = E[ V e er ] 
Jo _77^ 



(119) 



reAD 



Next we compute IE[^reAD e9r ] an< ^ § e ^ either exact formula or shape of 
the tail respectively. Observe through recurrence that: 



( r -p)} 



( T -P)} 



l + B 



1 - B ■ J °° /i(t)e *dt 



Bh{t)e et dt ) E[ J] e e(r ~ p) ] (120) 

(121) 



reT.4 



where 1 comes from the root; Bh(t) is the rate of birth of the (direct) children 
of the root at time p + t; e et comes from time shift and finally due to the 
fact that each child generates same tree we multiply EEreT^ e 9( - T ^ p ^}. 
Now similarly we let p be an arbitrary root and T p be its tree we further 



obtain through recurrence and use of (|120[) : 

roc 



re AD 



B-[T '#^E[5;e ( N] 

B ■ (£> g(t)e"dt) 
1 - B ■ J °° /i(t)e ei cft 



:i22i 



(123) 



where the formula in (122) is similar to (120): there is no 1 since no point 
at 0; the rate is Bg(t) and multiplication is the same. 

Finally when combining (117), (120) and (123) with 9 E iR we obtain 
the exact Fourier formula ( 110| ). Observe that if /i,/2 ; ■■■fk are probability 
density functions and aj > 0,i G {1,2, ...A;}, ^i=i °« = 1) then 



(/i * /2 * ••• * fk)(s)(ds) 



s=t 



n 

£ E 

i=l 
n 

< 

i=l 



E 



fi(si)...fk(s2)ds 1 ...ds k 

fi{si)dsi...ds k (124) 

fi(si)dsi 



Si>ait 



Si>CLit 

where second line follows from the fact that Yli=i s « > ^ implies that for 
some i, s, > a{t. (Another way to see this is via probability interpreta- 
tion if random variables X, are independent and have distribution /j then 

PE?=i*< > *] < > a ^ for some » e {1,2,..., A;}] = ELi TO > M 

where the first and last expressions correspond to those of (124)). Now we 
use this to get 
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d TV:t (F g ,F ) = j g *(—L^)V( s ) ds (125) 

1 — Bh 



X f X 



n=0 



\9\\i^ Bn l t (-^~* l l *-l-*h )(s)ds (126) 

n times 



< NllV-B n | / l^rds + n I h(s)ds\ (127) 



n=0 V Js; 

^ T^n / 9(s)ds+\\g\\ijrB n n f h(s)ds(128) 



n=0 

using ( |124P with fc = n + 1, /i = jj^i h = h = '" = fk+i = h, oi = |, 
a 2 = a 3 = • • • = o fc +i = ^. Finally: 

limsup r of T y'f;^ 0) w (129) 
t^oo it g{s) + h[s)ds 

2 

I=B / s> i5(s)o!s + \\g\\iEn=o Bnn f s> ^ h (s)ds 

^ r°° rrxT^u ( 13 °) 

It <?(s) + Msjds 

2 

< r^-S + W iC4<oo (131) 

I — JD 

□ 

7. Generalizations. 

7.1. Multi-type, Gibbs and Spacial Generalization. This section sug- 
gests three generalization to Theorem [2] 

(1) Hawkes process can be multi-type. 

(2) h{dt) can be supported on R thus making process "anticipating". 

(3) h(t)dt can be replaced by measure h. 

The proofs are simple modification of the ones we have presented here. 
Let us now give a taste of generalizations. Consider that points are now of 
finitely many types and the set of type is E = { 1 , . . . , d} . 

Now the process is specified by analogues triple (h, X,g)- 

• hij(t) : E x E x R — >• R + where i,j G E, WhijW^ = h 

• \ : E x Rf ->■ K+ 

• ^(t) :£xExl4l + . 

Then for all e G S the rate at time t of type e is given by 

r e (t) = A e (*(t)) (132) 
where is a matrix given by 

%=ffu(*)+Xl X] M*" 7 ") ( 133 ) 
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whre Si(t) are all the point of type i before time t. When h is a measure 
one needs the following condition to extend functionals A e 

Ve G, 3kfj G , Urn = £ (134) 

where c£Kto and A;^- for each e £ E describes behavior of A e at infinity. 
The analogue to Hypothesis [T] we now have: 

3k G Rf, Ve G E, X e (z) < X e (z) := c e + ^ ^jA;?- (135) 

As generalization of Propositoin [T] we get: 

Corollary 1.1. Suppose condition ( |135 ) is satisfied and if h is a measure 
then condition ( 134 ) is satisfied. Then the following statements hold: 

(i) If for each e G i?,A e is non- decreasing then Hawkes process is well defined 

(ii) If in addition matrix 

mi , e = kfj (136) 

indexed by i,e G E has spectrum supported on unit disk {c G C : \c\ < 1} 
then invariant measure exists and is unique minimal measure and hence 
ergodic. 

This generalization provides a glimpse on the range of application of the 
techniques of this paper. Theorem 2 and 3 can also be generalized yet the 
generalization in that context involves more nuances. 



7.2. Stationary initial conditions as a generalization. We observe 
that we can start from many stationary initial conditions for our process 
which is either defined in a regular way or in a thinning-thickening way. 
Now we can also reprove all the theorems if we consider that initial con- 
dition is some /xo + / where is translation invariant and / is the initia 
conditon with respect to that / and reprove all the theorems. One usefull 
application of this is a Hawkes-like linear process where the roots of the trees 
have some non-Poisson but stationary behaviour while the trees are like in 
Hawkes process. 
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